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Abstract 

Let T be an ?i-point subset of Euclidean space and > 3 be an integer. In this paper we study 
the following question: What is the smallest (normalized) relative change of the volume of sub- 
sets of V when it is projected into W^. We prove that there exists a linear mapping / : !P i— )• M'^ 
that relatively preserves the volume of all subsets of size up to \_d/2\ within at most a factor of 
0{n^/^ -y/lognloglogn) . 
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1. Introduction 



A classical result of Johnson and Lindenstrauss ||JL84|1 states that any n-point subset of Eu- 
clidean space can be projected into o(log«) dimensions while preserving the metric structure of 
the set. A natural question to pose would be what is the smallest distortion of any ?i-point subset of 
Euclidean space when it is projected into (fixed) d dimensions. This problem was first studied by 
Matousek [Mat90], who proved an 0{rp-/'^ -s/Xogn/d) upper bound on the distortion by projecting 
the points into M'' using a random cf-d imensional su bspace. In Section |3] we re-prove Matousek's 
result using the simplified analysis of I DG03 , IM98ll adapted in this setting, i.e., bounding the dis- 



tortion having fixed dimension instead of bounding the target dimension having fixed distortion. 
Although the simplified proof of the above result is well-known and well-understood, we hope that 
is not redundant and that it helps the reader to digest the following theorem 

Theorem 1. Let T be a n-point subset of and let 3 < d < ct, log n. Then there is a linear 
mapping f : ¥ \-^W^ such that 

ySCT,\S\< [d/2\ 1 < (^^^) ^ < C4n2/Vlog«loglog«, 

where €3,04 > are absolute constants, and 'Vol{S) is the {\S\ — \)-dimensional volume of the 
convex hull ofS. 
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Remark: The case where we fix the relative change of the volume of subsets to b e arbitra ry 



close to one, and ask what is the minimum dimension of such a mapping was studied in llMZOSll . 

Notice that if we only require to preserve pairwise distances the best upper bound is O {n^/'^ ^J\ogn/d), 
see Section [3l therefore our result can be thought of as a generalization of the distance preserving 
embeddings since it also guarantees distance preservation. Moreover, there e xists n-p oint subset of 
Euclidean space that any embedding onto has distortion il(?i^/L('^+i)/2j ^ llMatQCll . and thus the 
above worst-case upper bound cannot be much improved. 

2. Preliminaries and Technical Lemmas 

We start by defining an (stochastic) ordering between two random variables X and Y , but first 
let's motivate this definition. Assume that we have upper and lower bounds on the distribution 
function of Y , and also assume that it's hard to give precise bounds on the distribution function 
of X. Using this notion of ordering, if X "smaller than" Y , then we can bound the "complicated" 
variable X through bounding the "easy" variable Y . We use this notion extensively in this paper. 

More formally, let X and Y be two random variables, not necessarily on the same probability 
space. The random variable X is stochastically smaller than the random variable Y when, for every 
;c G M, the inequality 

p(x<x) >p(y <x) (1) 

holds. We denote this hy X <Y . 

Next we recall known results about the Chi-square distribution and also give bounds on its' 
cumulative distribution function. If X,, / = 1, . . . ,(i be independent, identically distributed normal 
random variables, then the random variable = Yfi^\Xf is a Chi-square random variable with d 
degrees of freedom. Notice that the expected value of is d. It is well known liFel7lL Chapter II, 
p. 47] that the Chi-square distribution is a special case of the Gamma distribution and its cumulative 
distribution function is given by 

where r(x) is the Gamma function, Y(a,x) = Jq t"^^e^^ dt and r{a,x) = t^'^e'' dt is the lower 
and upper incomplete Gamma function, respectively. Next we present some bounds on the Gamma 
and incomplete Gamma functions that we use in Sections OIH We start by presenti ng the fo llowing 



bound on the Gamma function, see for instance I CDOSl Lemmas 2.5. 2.6. 2.71 and | WW63, p.253]. 



Lemma 1 (Stirling Bound on Gamma Function). IfT{a) = f^e 't" ^ dt, where a > 0, then 

VlKa^'+^l^e-" < r(a + 1) < ^/2Ka"+^l^e-"+^^ , (3) 
Next we upper bound y{a,x). Note that y{a,x) = f^t''^^e^'dt < f^t"^^dt, hence 

y{a,x)<x"/a. (4) 
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Now for the upper incomplete gamma, we have the following bound. 
Lemma 2. IfT{a,x) = e^'t"^^ dt where x > 2{a + 1), then 

r(a , x) < 2 exp ( -x)x''+ \ (5) 



Proof. In lICDOSL Lemma 2.6] set a = 1 and d = 2. □ 



It is well-known iFB95l . pp. 220 — 235] that the volume that is spanned by the convex hull of a 



/:-point subset of M'^ along with the origin is equal to y^det{P'^ P)/kl, where P is the kx N matrix 
that contains the points as columns. The following lemma gives a connection between the volume 
of the convex hull of k points and the determinant of a specific matrix that is constructed using 
these points. 

Lemma 3. Let T = {pi,P2,---,Pk} be an k-point subset ofM^ in general position and let f : 
be a linear mapping. Let P .= [pi — Pi,P3 — Pi, ■ ■ ■ ,Pk — Pi] be an N x — 1) matrix. 

Then 

Vol(/(g)) ( det{{FPyFP) y^^ 
VoKs") 1^ detiP^P) J ' 

where F is the dxN matrix that corresponds to f. 

Proof. By a translation of the point-set ^P, i.e., identifying pi with the origin, it follows that 
V ol(g) = ^ydet{P ^P)/kl , since the volume is translation invariant, and similarly Vol(/(rP)) = 
^/detJJWpy"¥p)/k\. Since P is in general position, it follows that 

Vol(/(!P)) _ / det {{FPyFP) \ ^'^ 



Vol(rp) I det(pTp) 



□ 



Now, let's consider the above lemma in the setting where / is a random linear mapping. More 
specifically, let F be a Gaussian matrix, i.e., a matrix whose entries are i.i.d. Gaussian 
First observe that the fraction of the volumes is a random variable. Surprisingly enough, as the 
following lemma states, the fraction of the volumes in this setting is independent of 'P . This can be 
thought of as a generalization of the 2-stability property of inner products with Gaussian random 
vectors to matrix multiplication with Gaussian matrices. 

Lemma 4. Let 2" = {p\,p2,. ■ ■ ,Pk} be an k-point subset ofM.^ in general position. And let f : 
\-^W^ be a random Gaussian linear mapping. Then 

[ Yarn ) -n^C^-m- (7) 
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Proof. It is a simple consequence of Lemma 3] and the above lemma. □ 

Remark 1. Fork = 2 in Lemma^ we get \\f{pi) — f{p2)\\^ /\\pi — PiW^ ~ Xj- 

Equation |7] gives the distribution of the fraction of the volume as a product of independent 
random variables. However, in general it's difficult to deal with such a product, and so we employ 
the following theorem that sandwiches this product with a single Chi-square distributions. 

Theorem 2 (Theorem 4, Let Ui := X^_;^i be independent Chi-square random variables 

for i = 1,2, ... ,s. Then the following holds for every s>l. 

We now have enough tools at our disposal to prove Theorem [T] 



3. Distance Distortion 

In this section we prove the following 

Theorem 3. Let T be a n-point subset of M.^ and let 3 < d < cilogw, where ci is a positive 
constant. Then there exists a linear mapping f : ¥ with (distance) distortion dist{f) = 

O (rP'/^ \/log n/d), i.e., there exists an absolute constant c > such that 

yx,y G !?, \\x-y\\ < \\f{x)-fiy)\\ < cn^^' y/logn/d\\x-y\\. 

Proof. Similarly as in Consider the random linear map / : ^ M^, f{x) :=R x where 

Ris and xN random Gaussian matrix. Using linearity of / and Remark[T]it follows that \\f{x) — 
fiy)\\'^/\\^~y\\^ ~ Xd x,y ^ v. our goal is to show that Xd sufficiently concentrated. 

More specifically, it suffices to show that doesn't fall outside an interval [a,b] for some & G R 
with constant probability. This aims to upper bound the probabilities Pr[x^j < a^] and Pr[x^ > b^]. 

The elements of 2" determine at most (2) distinct direction vectors. Applying union bound over 
all pairs of T gives that if 

(2)(lP(x?/<«')+P(x^>^'))<l, (9) 

then there exists / that expands every distance in T by at most b times and contracts at least a times, 
so dist(f) < b/a. Our goal therefore is to specify a,b in terms of d and n such that Inequality |9] 
holds. To do so, we first bound T{d/2) from below, which will be used later. By Lemma [T] 
we have that r(<i/2) > e^'^/^{d -l)'^'^^'^''!^ /T^l^ . Now, we will bound a,b separately. We find 
a such that (^"^ (y^d — '^^) < V^- Using Equation |4] and the previous analysis we require that 
T e-''/2|'rf^'i(rf-i)/2 < V2> which holds for all > 3 if we set a = C2Vd/n^^'^, where C2 > is an 
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absolute constant. Similarly, we will find b such that (x^ > < 1 /2. Using LemmalU and 
assume for the moment that >2d — 2, we have that 

^^'^^-^ ^ - rid/2) - (j-2)(^-i)/2 ■ 

It suffices to show that In (^n^ ^{d-t)^''-^'!'^ ) negative for large enough n. Indeed, 

InU' (^-2)(^-i)/2 ) < 21nn + (^-2)lnZ7-feV2-^/2-^ln(^-2). 

Note that if d > d' then P (x^, >b^) < P (x^ > . Thus we can assume that d = c\ logn, since 
if we can bound it, then we can bound it for all fixed d < cilogn. Define g{b,n) = 2\nn + 
(d — 2)lnZ? — b^/2 — d/2 — ^^ln{d — 2). We want to show that g{b, n) < for large enough 
n. By choosing b = 5ci^/\ogn, and recall that d = cilogn hence b^ > 2d — 2, we conclude 
that lim„^oog(5\/ln?i, n) = — oo as desired. Hence, we can choose a,b functions of n such that 
b/a = ^=cnV^.J\^d. □ 

4. Proof of Main Theorem 

Our goal is to find a mapping / : !P — )• such that 



(10) 



where D is the volume distortion of the mapping. We will see in the analysis below that we can 
set k = \d/2\ and D = O {rp-/'^ \/log n log log n ) . We can assume w.l.o.g. that the input points are in 
general position, i.e., every subset of size up to k is affinely independent. If not, both the original 
points and projected points will span zero volume. 

Similarly with Section |3l we take a random / using a Gaussian random matrix and show that it 
satisfies (flOl ) with constant probability. To do so, we first bound the probability that a fixed subset 
"contracts" its' volume by more than a factor a. 

Lemma 5. Fix any subset S C V of size \S\ = s + \ with \ < s <k. Then 

Vvol(/(5))^PFT ^ \ ^ [esa^yn 



I V Vol(5) J ~ ~ f(f-2)('-i)/2' 
where t = s{d — s +\). 

Proof. Using Lemma|4]we know that the above probability is equal to P ( (llLi ^ 



Using Theorem 121 we can bound the above probability of product of Chi-square random variables 



5 



with a single Chi-square. More specifically, using the stochastic ordering we have the following 
inequality 




1=1 



for every I < s < k. Now, we have a single Chi-square random variable and thus we can bound it 
from above, the same way as we did in Section [3l using Lemma ([T]) and Equation Q. It follows 

that P ('v2 < . . _ y{t/2 sa2/2) ^ {esa')'/^ r-. 

Similarly, we bound the probability that a fixed subset "expands" it's volume by more than a 
factor b. 

Lemma 6. Fix any subset S C 'P of size \S\ = s + I with \ < s < k. If sb^ >2l + 4, then 



Vol(/(5)) \^^ \ ^ e-'^jsb'f^^ 
Vol(5) ) ~ ~ (/-2)('-i)/2 ' 



where l = s{d-s+\) + ^"'^^^''^^ 



2 

lA 




Proof. As in the previous lemma the above probability is equal to P (^(lILi ^ ^^j > 

again using Theorem |2] it follows that 



sh^—l 

Using Lemmas[Il|2]it follows that P(x^> 5-^72) = ^^^(.^f^y^^ < "'^^'j'Sy!^' ■ □ 

Notice that if d' > d, then 5^^^ < E^i ^ from the stochastic ordering of the Chi-square distribu- 
tion. Now we are ready to apply union bound. Our goal is to find a such that with probability at 
least 1 /2, our embedding does not contract volumes of subsets of size up to ^ by a factor a. 

By union bounding over all sets of fixed size /, 1 < / < fc, we want to find a such that 

n \ (eia^-y-^^ 1 



i+lj tiiti-lY'--^)/^ 2k' 

where ti = i{d — i+l). Note that if we sum over all different size of subsets (/ = 1 , . . . , ^) we get that 
the failure probability is at most 1/2. It suffices to show that In ^2fc(. " J t it-'^'l}'':-^^/^ ) negative 
for large enough n and for every I <i <k and J > 3, or equivalently the following is negative 

ln2 + InA: + (/ + 1) ln?i + tilna + (f;/2 - /) In/ + (f,-/2 + /) - Inf,- - (^^^-^) ln(f; - 2). 
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Let's group the terms of the right hand size and bound them individually. It is not hard to see that 

{ti/2-i)lni- (^)ln(fi -2) < and Ink-lnti < since k<d <ti and ti = i{d-i + \), when 
i = I,... ,k and for d >3. Hence, it suffices to show that 

ln2 + (/+ l)ln« + ?,lna + (?,-/2 + /) < 0. 

Set a = Cen^'^, for some positive y that will be specified shortly and Cg a sufficient small positive 
constant. Recall that we want the above inequality to hold for every \ <i <k. We can choose 
Ce smaller than e^^ and take care of the f,/2 + / + ln2 term. Lets now focus on the dominate 
term (/+ l)ln?i. It follows that the above quantity is negative if y > j^^jy^)^ all / = . ,k. 
Let's study closer the function hii{x) = We will show that hii{x) is convex on the domain 

[l,J/2] and also increasing in the domain [d/4,d\ for any fixed J > 3. A simple calculation 
shows that h'l^^{x) > for ;c e [l,fi?] and h'^{x) > for ;c G [^id] (details omitted). Also note that 
hii{\) = hd{d/2) = 2/d. By convexity in [1, J/2], we get that hd{x) < 2/d for all x G [l,J/2]. 

The above analysis gives a bound on the parameter k, i.e., the maximum size of subsets that we 
can consider. Thus, we get that k should be less than or equal to [d/2\ . 

To sum up, we have proved that if a = Cgn^^l'' then with probability at least 1/2 our embed- 
ding doesn't contract the normalized volumes of subsets of size at most \d/2\ by more than a 
multiplicative factor of a. 

Next our goal is to find b such that with probability at least 1/2, / does not expand volumes 
by more than a factor of b. Let /, = /(J — / + 1) + ^'^^2'^^^ • ^PP^y union bound over all sets of 
fixed size i,\<i<k together with Lemma[6] assuming for the moment that ib^ > 4Z, + 8. We want 
to find b such that 



n \ e -^(/Z72)''/2+i ^ 1 



j + lj (Z,.-2)(''-i)/2 2k' 
Summing over all different size of subsets we get the desired property with probability at least 1 /2. 

/ . 2 / /2+l\ 

It suffices to show that In | 2/: (."J ^ (i-2)'''i-' ^ — ) negative for every I < i < k and d G 

[3,log?i]. Similarly with Section[3]we can assume without loss of generality that d = cslog?!, using 
the fact that if d' <d then E^i f, < E^ ^- 

Now, since there are at most (."J < (7^)'^^ subsets of size /+ 1, it suffices to show that the 
following quantity is negative, 

-I (/,:2)'-,-./' J^'"( (,-2)..,-./^ j< 

{li/2 + 1) In/ + (/ + 1) ln« + kXnb + li/2 + 2/ + Inyt - ( — + InZ, 
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Note that in the last quantity the positive terms are of order o{idlni + i\nn). The negative terms 
are of order o{ib^). Recall that i < d = C3log?i. It is not hard to see that by choosing b = 
C2-v/log"log logw, where C2 > a sufficient large constant, then ib^ > 41 1 + 8 and the above quantity 
goes to — oo as n grows for every I <i < k. 

To sum up, we proved that with probability at least 1/2, f doesn't expand normalized volumes 
of subsets of size at most [d/2\ by more than a multiplicative factor of b. 

Rescaling / by a, we conclude that there exists a,b with a < b such that 



This concludes the proof of Theorem [T] 
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P V5CP,|5| < [d/2\,l < 



( 



Yolif{S)) \\^ ^ b 
Vol (5) J -a 
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